Goto

Collaborating Authors

 sequential feature selection


Statistical Inference for Sequential Feature Selection after Domain Adaptation

Loc, Duong Tan, Loi, Nguyen Thang, Duy, Vo Nguyen Le

arXiv.org Machine Learning

In high-dimensional regression, feature selection methods, such as sequential feature selection (SeqFS), are commonly used to identify relevant features. When data is limited, domain adaptation (DA) becomes crucial for transferring knowledge from a related source domain to a target domain, improving generalization performance. Although SeqFS after DA is an important task in machine learning, none of the existing methods can guarantee the reliability of its results. In this paper, we propose a novel method for testing the features selected by SeqFS-DA. The main advantage of the proposed method is its capability to control the false positive rate (FPR) below a significance level $\alpha$ (e.g., 0.05). Additionally, a strategic approach is introduced to enhance the statistical power of the test. Furthermore, we provide extensions of the proposed method to SeqFS with model selection criteria including AIC, BIC, and adjusted R-squared. Extensive experiments are conducted on both synthetic and real-world datasets to validate the theoretical results and demonstrate the proposed method's superior performance.


Fast Classification with Sequential Feature Selection in Test Phase

Mirzaei, Ali, Pourahmadi, Vahid, Sheikhzadeh, Hamid, Abdollahpourrostam, Alireza

arXiv.org Artificial Intelligence

This paper introduces a novel approach to active feature acquisition for classification, which is the task of sequentially selecting the most informative subset of features to achieve optimal prediction performance during testing while minimizing cost. The proposed approach involves a new lazy model that is significantly faster and more efficient compared to existing methods, while still producing comparable accuracy results. During the test phase, the proposed approach utilizes Fisher scores for feature ranking to identify the most important feature at each step. In the next step the training dataset is filtered based on the observed value of the selected feature and then we continue this process to reach to acceptable accuracy or limit of the budget for feature acquisition. The performance of the proposed approach was evaluated on synthetic and real datasets, including our new synthetic dataset, CUBE dataset and also real dataset Forest. The experimental results demonstrate that our approach achieves competitive accuracy results compared to existing methods, while significantly outperforming them in terms of speed. The source code of the algorithm is released at github with this link: https://github.com/alimirzaei/FCwSFS.


A Practical Introduction to Sequential Feature Selection

#artificialintelligence

Sequential feature selection is a supervised approach to feature selection. It makes use of a supervised model and it can be used to remove useless features from a large dataset or to select useful features by adding them sequentially. This is a forward approach because we start with 1 feature and then we add other features. There's a backward approach as well, that starts from all the features and removes the less relevant ones according to the same maximization criteria. Since, at each step, we check the performance of the model with the same dataset with the addition of each remaining feature (one by one), it's a greedy approach. The algorithm stops when the desired number of features is reached or if the performance doesn't increase above a certain threshold.